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Abstract 

In case-based reasoning, the adaptation of a source 
case in order to solve the target problem is at the 
same time crucial and difficult to implement. The 
reason for this difficulty is that, in general, adapta- 
tion strongly depends on domain-dependent knowl- 
edge. This fact motivates research on adaptation 
knowledge acquisition (AKA). This paper presents 
an approach to AKA based on the principles and 
techniques of knowledge discovery from databases 
and data-mining. It is implemented in Cabama- 
kA, a system that explores the variations within the 
case base to elicit adaptation knowledge. This sys- 
tem has been successfully tested in an application 
of case-based reasoning to decision support in the 
domain of breast cancer treatment. 



1 Introduction 

Case-based reasoning (CBR [Riesbec k and Schank, 19891) 
aims at solving a target problem thanks to a case base. A 
case represents a previously solved problem and may be seen 
as a pair (problem, solution). A CBR system selects a case 
from the case base and then adapts the associated solution, 
requiring domain-dependent knowledge for adaptation. The 
goal of adaptation knowledge acquisition (AKA) is to detect 
and extract this knowledge. This is the function of the semi- 
automatic system CabamakA, which applies principles of 
knowledge discovery from databases (KDD) to AKA, in par- 
ticular frequent itemset extraction. This paper presents the 
system CabamakA: its principles, its implementation and 
an example of adaptation rule discovered in the framework 
of an application to breast cancer treatment. The original- 
ity of CabamakA lies essentially in the approach of aka 
that uses a powerful learning technique that is guided by a 
domain expert, according to the spirit of KDD. This paper 
proposes an original and working approach to AKA, based 
on KDD techniques. In addition, the KDD process is per- 
formed on a knowledge base itself, leading to the extraction 
of meta-knowledge, i.e. knowledge units for manipulating 
other knowledge units. This is also one of the rare papers try- 
ing to build an effective bridge between knowledge discovery 
and case-based reasoning. 



The paper is organized as follows. Section [2] presents ba- 
sic notions about CBR and adaptation. Section [3] summarizes 
researches on aka. Section [4] describes the system Caba- 
makA: its main principles, its implementation and examples 
of adaptation knowledge acquired from it. Finally, section [5] 
draws some conclusions and points out future work. 

2 cbr and Adaptation 

A case in a given CBR application encodes a problem-solving 
episode that is represented by a problem statement pb and 
an associated solution Sol(pb). The case is denoted by 
the pair (pb, Sol(pb)) in the following. Let Problems and 
Solutions be the set of problems and the set of solutions 
of the application domain, and "is a solution of" be a binary 
relation on Problems x Solutions. In general, this rela- 
tion is not known in the whole but at least a finite number of 
its instances (pb, Sol(pb)) is known and constitutes the case 
base CB. An element of CB is called a source case and is de- 
noted by srce-case = (srce, Sol(srce)), where srce is a 
source problem. In a particular CBR session, the problem to 
be solved is called target problem, denoted by tgt. 

A case-based inference associates to tgt a solution 
Sol(tgt), with respect to the case base CB and to additional 
knowledge bases, in particular O, the domain ontology (also 
known as domain theory or domain knowledge) that usually 
introduces the concepts and terms used to represent the cases. 
It can be noticed that the research work presented in this paper 
is based on the assumption that there exists a domain ontol- 
ogy associated with the case base, in the spirit of knowledge- 
intensive CBR | |Aamodt, 79 901. 

A classical decomposition of CBR relies on the 
steps of retrieval and adaptation. Retrieval selects 
(srce, Sol(srce)) G CB such that srce is similar to 
tgt according to some similarity criterion. The goal of adap- 
tation is to solve tgt by modifying Sol(srce) accordingly. 
Thus, the profile of the adaptation function is 

Adaptation: ((srce, Sol(srce)), tgt) i— > Sol(tgt) 

The work presented hereafter is based on the follow- 
ing model of adaptation, similar to transformational anal- 
ogy HCarbonell, 19831 : 

© (srce, tgt) i ► Apb, where Apb encodes the similari- 
ties and dissimilarities of the problems srce and tgt. 



© (Apb,AK) i— > Asol, where AK is the adaptation 
knowledge and where Asol encodes the similarities 
and dissimilarities of Sol(srce) and the forthcoming 
Sol(tgt). 

® (Sol(srce), Asol) ^ Sol(tgt), Sol(srce) is modi- 
fied into Sol(tgt) according to Asol. 

Adaptation is generally supposed to be domain-dependent 
in the sense that it relies on domain-specific adaptation 
knowledge. Therefore, this knowledge has to be acquired. 
This is the purpose of adaptation knowledge acquisition 
(AKA). 

3 Related Work in aka 

The notion of adaptation case is introduced 
in | |Leake et ai, 1996) . The system dial is a case- 
based planner in the domain of disaster response planning. 
Disaster response planning is the initial strategic planning 
used to determine how to assess damage, evacuate victims, 
etc. in response to natural and man-made disasters such 
as earthquakes and chemical spills. To adapt a case, the 
DIAL system performs either a case-based adaptation or a 
rule-based adaptation. The case-based adaptation attempts 
to retrieve an adaptation case describing the successful 
adaptation of a similar previous adaptation problem. An 
adaptation case represents an adaptation as the combination 
of transformations (e.g. addition, deletion, substitution) plus 
memory search for the knowledge needed to operationalize 
the transformation (e.g. to find what to add or substitute), 
thus reifying the principle: adaptation = transformations 
+ memory search. An adaptation case in DIAL packages 
information about the context of an adaptation, the derivation 
of its solution, and the effort involved in the derivation 
process. The context information includes characteristics 
of the problem for which adaptation was generated, such as 
the type of problem, the value being adapted, and the roles 
that value fills in the response plan. The derivation records 
the operations needed to find appropriate values in memory, 
e.g. operations to extract role-fillers or other information to 
guide the memory search process. Finally, the effort records 
the actual effort expended to find the solution path. It can be 
noticed that the core idea of "transformation" is also present 
in our own adaptation knowledge extraction. 

In (Jarmulak et al , 2001 1, an approach to AKA is presented 
that produces a set of adaptation cases, where an adaptation 
case is the representation of a particular adaptation process. 
The adaptation case base, CB A , is then used for further adap- 
tation steps: an adaptation step itself is based on CBR, reusing 
the adaptation cases of CB A . CB A is built as follows. For 
each (srcei, Sol(srcei)) e CB, the retrieval step of the CBR 
system using the case base CB without (srcei, Sol(srcei)) 
returns a case (srce2, Sol(srce2)). Then, an adaptation 
case is built based on both source cases and is added to CB A . 
This adaptation case encodes srcei, Sol(srcei), the dif- 
ference between srcei and srce2 (Apb, with the notations 
of this paper) and the difference between Sol(srcei) and 
Sol(srce2) (Asol). This approach to AKA and CBR has 
been successfully tested for an application to the design of 
tablet formulation. 



The idea of the research presented 
in I jHanney and Keane, 1996| |Hanney, 1997) is to ex- 
ploit the variations between source cases to learn adaptation 
rules. These rules compute variations on solutions from 
variations on problems. More precisely, ordered pairs 
(srce-casei, srce-case2) of similar source cases are 
formed. Then, for each of these pairs, the variations between 
the problems srcei and srce2 and the solutions Sol(srcei) 
and Sol(srce2) are represented (Apb and Asol). Finally, 
the adaptation rules are learned, using as training set the 
set of the input-output pairs (Apb, Asol). This approach 
has been tested in two domains: the estimation of the price 
of flats and houses, and the prediction of the rise time of 
a servo mechanism. The experiments have shown that the 
CBR system using the adaptation knowledge acquired from 
the automatic system of AKA shows a better performance 
compared to the CBR system working without adaptation. 
This research has influenced our work that is globally based 
on similar ideas. 

[Shi u et al, 200 1 1 proposes a method for case base main- 
tenance that reduces the case base to a set of representative 
cases together with a set of general adaptation rules. These 
rules handle the perturbation between representative cases 
and the other ones. They are generated by a fuzzy decision 
tree algorithm using the pairs of similar source cases as a 
training set. 

In i Wiratunga et ah, 2002| , the idea 

of [ Hanney and Keane, 1996| is reused to extend the 
approach of I Jarmu lak et al, 2001) : some learning algo- 
rithms (in particular, C4.5) are applied to the adaptation cases 
of CB A , to induce general adaptation knowledge. 

These approaches to AKA share the idea of exploiting 
adaptation cases. For some of them (| | Jarmulak et al, 200 1[ 
|Leake et al, 1996) ), the adaptation cases themselves con- 
stitute the adaptation knowledge (and adaptation is itself a 



CBR process). For the other ones (I Hanney and Keane, 1996 
|Shiu et al, 2001) Wiratunga et al, 2002) ), as for the ap- 



proach presented in this paper, the adaptation cases are the 
input of a learning process. 

4 CabamakA 

We now present the CabamakA system, for acquiring adap- 
tation knowledge. The CabamakA system is at present 
working in the medical domain of cancer treatment, but it 
may be reused in other application domains where there exist 
problems to be solved by a CBR system. 

4.1 Principles 

CabamakA deals with case base mining for aka . Although 
the main ideas underlying CabamakA are shared with those 
presented in [ Hanney and Keane, 1996) , the followings are 
original ones. The adaptation knowledge that is mined has 
to be validated by experts and has to be associated with ex- 
planations making it understandable by the user. In this way, 
CabamakA may be considered as a semi-automated (or in- 
teractive) learning system. This is a necessary requirement 
for the medical domain for which CabamakA has been ini- 
tially designed. 



Moreover, the system takes into account every or- 
dered pair (srce-casei, srce-case2) with srce-casei ^ 
srce-case2, leading to examine n(n — 1) pairs of cases 
for a case base CB where |CB| = n. In practice, this 
number may be rather large since in the present applica- 
tion n ~ 650 (n(n — 1) ~ 4 • 10 5 ). This is one rea- 
son for choosing for this system efficient KDD techniques 
such as Charm [Zaki and Hsiao, 200 2] . This is different 
from the approach of | |Hanney and Kea ne, 1996| , where only 
pairs of similar source cases are considered, according to a 
fixed criterion. In CabamakA, there is no similarity cri- 
terion on which a selection of pairs of cases to be com- 
pared could be carried out. Indeed, the CBR process in 
CabamakA relies on the adaptation-guided retrieval prin- 



ciple [Smy th and Keane, 1996| , where only adaptable cases 
are retrieved. Thus, every pair of cases may be of interest, 
and two cases may appear to be similar w.r.t. a given point of 
view, and dissimilar w.r.t. another one. 

Principles of KDD. The goal of kdd is to discover knowl- 
edge from databases, under the supervision of an analyst (ex- 
pert of the domain). A KDD session usually relies on three 
main steps: data preparation, data-mining, and interpretation 
of the extracted pieces of information. 

Data preparation is mainly based on formatting and filter- 
ing operations. The formatting operations are used to trans- 
form the data into a form allowing the application of the cho- 
sen data-mining operations. The filtering operations are used 
for removing noisy data and for focusing the data-mining op- 
eration on special subsets of objects and/or attributes. 

Data-mining algorithms are applied for extracting 
from data information units showing some regularities 
| |Hand et ah, 2001) . In the present experiment, the Charm 
data-mining algorithm that efficiently performs the ex- 
traction of frequent closed itemsets (FCIs) has been used 
l |Zaki and Hsiao, 2002) . Charm inputs a formal database, 
i.e. a set of binary transactions, where each transaction T is 
a set of binary items. An itemset I is a set of items, and the 
support of /, support(J), is the proportion of transactions 
T of the database possessing / (/ C T). / is frequent, with 
respect to a threshold a £ [0; 1], whenever support(J) > a. 
I is closed if it has no proper superset J (I C J) with the 
same support. 

The interpretation step aims at interpreting the extracted 
pieces of information, i.e. the FCIs in the present case, with 
the help of an analyst. In this way, the interpretation step 
produces new knowledge units (e.g. rules). 

The CabamakA system relies on these main kdd steps 
as explained below. 

Formatting. The formatting step of CabamakA inputs 
the case base CB and outputs a set of transactions obtained 
from the pairs (srce-casei, srce-case2). It is composed 
of two substeps. During the first substep, each srce-case = 
(srce, Sol(srce)) £ CB is formatted in two sets of boolean 
properties: $(srce) and $(Sol(srce)). The computation of 
$(srce) consists in translating srce from the problem rep- 
resentation formalism to 2 V , V being a set of boolean prop- 



erties. Some information may be lost during this translation, 
for example, when translating a continuous property into a 
set of boolean properties, but this loss has to be minimized. 
Now, this translation formats an expression srce expressed 
in the framework of the domain ontology O to an expression 
$(srce) that will be manipulated as data, i.e. without the use 
of a reasoning process. Therefore, in order to minimize the 
translation loss, it is assumed that 



if p £ $(srce) and p \=o q then q £ $(srce) 



(1) 



for each p,q £ V (where p \=o q stands for "q is a conse- 
quence of p in the ontology O"). In other words, $(srce) is 
assumed to be deductively closed given O in the set V . The 
same assumption is made for $(Sol(srce)). How this first 
substep of formatting is computed in practice depends heavily 
on the representation formalism of the cases and is presented, 
for our application, in section l4~2l 

The second substep of formatting produces a transaction 
T = <l>((srce-casei, srce-case2)) for each ordered pair 
of distinct source cases, based on the sets of items $(srcei), 
$(srce 2 ), $(Sol(srcei)) and $(Sol(srce 2 )). Following 
the model of adaptation presented in section [2] (items [©] [©] 
and [@| |, T has to encode the properties of Apb and Asol. 
Apb encodes the similarities and dissimilarities of srcei and 
srce2, i.e.: 

• The properties common to srcei and srce2 (marked by 
"="), 

• The properties of srcei that srce2 does not share ("-"), 
and 

• The properties of srce2 that srcei does not share ("+")■ 

All these properties are related to problems and thus are 
marked by pb. Asol is computed in a similar way and 
$(T) = Apb U Asol. For example, 



if 

then 
More 
T 



$(srcei) = {a, b, c} $(Sol(srcei)) = {A, B} 
$(srce 2 ) = {b, c, d} $(Sol(srce 2 )) = {B, C] 

T = { a pb , b ?h , c pb , dp b , A SQl ,B sol , C* i } (2) 
generally: 

= {Ppb I P G $(srcei)\$(srce 2 )} 

U {f)p b | p £ $(srcei) n $(srce 2 )} 

U {p + ph | p £ $(srce 2 )\$(srcei)} 

U {p' sol I V G $(Sol(srcei))\$(Sol(srce 2 ))} 

U {P = s0l I P G $(Sol(srcei)) n $(Sol(srce 2 ))} 

U tool I P e $(Sol(srce 2 ))\$(Sol(srcei))} 



Filtering. The filtering operations may take place before, 
between and after the formatting substeps, and also after the 
mining step. They are guided by the analyst. 

Mining. The extraction of FCIs is computed thanks to 
Charm (in fact, thanks to a tool based on a CHARM-like 
algorithm) from the set of transactions. A transaction T = 



$((srce-casei, srce-case2)) encodes a specific adapta- 
tion ((srcei, Sol(srcei)), srce2) i— > Sol(srce2). For ex- 
ample, consider the following FCI: 

I = { a pbJ c pb: ^pbi ^soli -^sol; Csol} (3) 

/ can be considered as a generalization of a subset of the 
transactions including the transaction T of equation ©: / C 
T. The interpretation of this FCI as an adaptation rule is ex- 
plained below. 

Interpretation. The interpretation step is supervised by the 
analyst. The CabamakA system provides the analyst with 
the extracted FCIs and facilities for navigating among them. 
The analyst may select an FCI, say I, and interpret I as an 
adaptation rule. For example, the FCI in equation (0) may be 
interpreted in the following terms: 

if a is a property of srce but is not a property of tgt, 
c is a property of both srce and tgt, 
d is not a property of srce but is a property of tgt, 
A and B are properties of Sol(srce), and 
C is not a property of Sol(srce) 
then the properties of Sol(tgt) are 

*(Sol(tgt)) = (*(Sol(srce)) \ {A}) U {C}. 

This rule has to be translated from the formalism 2 V (sets of 
boolean properties) to the formalism of the adaptation rules of 
the CBR system. The result is an adaptation rule, i.e. a rule 
whose left part represents conditions on srce, Sol(srce) 
and tgt and whose right part represents a way to compute 
Sol(tgt). The role of the analyst is to correct and to validate 
this adaptation rule and to associate an explanation with it. 
The analyst is helped in this task by the domain ontology O 
that is useful for organizing the FCIs and by the already avail- 
able adaptation knowledge that is useful for pruning from the 
FCIs the ones that are already known adaptation knowledge. 

4.2 Implementation 

The CabamakA discovery process relies on the steps de- 
scribed in the previous section: (si) input the case base, (S2) 
select a subset of it (or take the whole case base): first filter- 
ing step, (S3) first formatting substep, (S4) second filtering 
step, (S5) second formatting substep, (sq) third filtering step, 
(sr) data-mining (Charm), (sg) last filtering step and (sg) 
interpretation. This process is interactive and iterative: the 
analyst runs each of the (sj) (and can interrupt it), and can go 
back to a previous step at each moment. 

Among these steps, only the first ones ((si) to (S3)) and 
the last one are dependent on the representation formalism. 
In the following, the step (53) is illustrated in the context of 
an application. First, some elements on the application itself 
and the associated knowledge representation formalism are 
introduced. 

Application domain. The application domain of the cbr 
system we are developing is breast cancer treatment: in this 
application, a problem pb describes a class of patients with a 
set of attributes and associated constraints (holding on the age 
of the patient, the size and the localization of the tumor, etc.). 



A solution Sol(pb) of pb is a set of therapeutic decisions (in 
surgery, chemotherapy, radiotherapy, etc.). 

Two features of this application must be pointed out. First, 
the source cases are general cases (or ossified cases according 
to the terminology of [Riesbec k and Schank, 1989| ): a source 
case corresponds to a class of patients and not to a single 
one. These source cases are obtained from statistical studies 
in the cancer domain. Second, the requested behavior of the 
CBR system is to provide a treatment and explanations on this 
treatment proposal. This is why the analyst is required to 
associate an explanation to a discovered adaptation rule. 

Representation of cases and of the domain ontology 

O. The problems, the solutions, and the domain ontology 
of the application are represented in a light extension of 
OWL DL (the Web Ontology Language recommended by the 
W3C l |Staab and Studer, 2004| ). The parts of the underlying 
description logic that are useful for this paper are presented 
below (other elements on description logics, DLs, may be 
found in | |Staab and Studer, 2004J ). 
Let us consider the following example: 

srce = Patient l~l 3age.>45 n 3age.<7o 
n 3tumor.(3size.>4 

n 3localization.Lef t-Breast) 

(4) 

srce represents the class of patients with an age a G [45; 70 [, 
and a tumor of size S > 4 centimeters localized in the left 
breast. 

The DL representation entities used here are atomic and 
defined concepts (e.g. srce, Patient and 3age.>45), roles 
(e.g. tumor and localization) concrete roles (e.g. age and 
size) and constraints (e.g. >45 and <7o). A concept C is an 
expression representing a class of objects. A role r is a name 
representing a binary relation between objects. A concrete 
role g is a name representing a function associating a real 
number to an object (for this simplified presentation, the only 
concrete domain that is considered is (IR, <), the ordered set 
of real numbers). A constraint c represents a subset of IR de- 
noted by c R . For example, intervals such as >4 5 = [45; +00 [ 
and <7o=] — 00; 70 [ introduce constraints that are used in the 
application. 

A concept is either atomic (a concept name) or defined. 
A defined concept is an expression of the following form: 
C l~l D, 3r.C or 3g.c, where C and D are concepts, r is 
a role, g is a concrete role and c is a constraint (many 
other constructions exist in the DL, but only these three con- 
structions are used here). Following classical DL presenta- 
tions [Staa b and Studer, 2004| , an ontology O is a set of ax- 
ioms, where an axiom is a formula of the form CCD (general 
concept inclusion) or of the form C = D, where C and D are 
two concepts. 

The semantics of the DL expressions used hereafter can 
be read as follows. An interpretation is a pair X — (Ax, - x ) 
where Ax is a non empty set (the interpretation domain) and 
• x is the interpretation function, which maps a concept C to a 
set C x C Ax, a role r to a binary relation r x C Ax x Ax, 
and a concrete role g to a function g x : Ax — ► IR. In 
the following, all roles r are assumed to be functional: x 



maps r to a function r x : Aj — ► Aj. The interpretation 
of the defined concepts, for an interpretation I, is as follows: 

(C n D) x = C x n D x , (3r.C) x is jhe set of objects x G Aj 



such that r x (x) 
Ai such that g x (x) € c 
of an axiom CCD (resp. 



G C and (3g.c) is the set of objects x G 



'. An interpretation X is a model 
C = D) if C x C 



D x (resp. 



C x = 



D ). T is a model of an ontology O if it is a model of each 
axiom of O. The inference associated with this representation 
formalism that is used below is the subsumption test: given an 
ontology O, a concept C is subsumed by a concept D, denoted 
by No C □ D, if for every model 1 of O, C x C D x . 

More practically, the problems of the CBR application are 
represented by concepts (as srce in (|4]l). A therapeutic de- 
cision dec is also represented by a concept. A solution is 
a finite set {deci, dec2, . . .decfe} of decisions. The de- 
cisions of the system are represented by atomic concepts. 
The knowledge associated with atomic concepts (and hence, 
with therapeutic decisions) is given by axioms of the do- 
main ontology O. For example, the decision in surgery 
dec = Partial-Mastectomy represents a partial ablation 
of the breast: 

Partial-Mastectomy □ Mastectomy 

Mastectomy C Surgery (5) 
Surgery □ Therapeutic-Decision 

Implementation of the first formatting substep (s 3 ). 
Both problems and decisions constituting solutions are rep- 
resented by concepts. Thus, computing $(srce) and 
$(Sol(srce)) amounts to the computation of $(0), C be- 
ing a concept. A property p is an element of the finite 
set V (see section 14. It . In the DL formalism, p is repre- 
sented by a concept P. A concept C has the property p if 
\=o C CP. The set of boolean properties and the set of the 
corresponding concepts are both denoted by V in the follow- 
ing. Given V, $(C) is simply defined as the set of properties 
P G V that C has: 

*(C) = {P G V | N C C p} (6) 

As a consequence, if P G $(C), Q G V and N G PCq then 
Q G $(C). Thus, the implication (Q]) is satisfied. 

The algorithm of the first formatting substep that has been 
implemented first computes the <£>(C)'s for C: the source prob- 
lems and the decisions occurring in their solutions, and then 
computes V as the union of the <£>(C)'s. This algorithm relies 
on the following set of equation^]: 



$(A) 

$(c n d) 

$(3r.C) 
$(3g.c) 
Cstraints,, 



B is an atomic concept 
occurring in KB and NpACB 

$(C) U $(D) 
{3r.P | P G *(C)} 

|3g.d d G CstraintSg and c R C d R | 
{c | the expression 3g.c occurs in KB} 



This set of equations itself can be seen as a recursive algorithm, 
but is not very efficient since it computes several times the same 
things. The implemented algorithm avoids these recalculations by 
the use of a cache. 



where A is an atomic concept, C and D are (either atomic 
or defined) concepts, r is a role, g is a concrete role, c is a 
constraint and KB, the knowledge base, is the union of the 
case base and of the domain ontology. 

It can be proven that the algorithm for the first formatting 
substep (computing the $(C)'s and the set of properties V) 
respects (O under the following hypotheses. First, the con- 
structions used in the DL are the ones that have been intro- 
duced above (C n D, 3r.C and 3g.c, where r is functional). 
Second, no defined concept may strictly subsume an atomic 
concept (for every atomic concept A, there is no defined con- 
cept C such that No A C C and No A = C). Under these 
hypotheses, d6) can be proven by a recursion on the size of 
C (this size is the number of constructions that C contains). 
These hypotheses hold for our application. However, an on- 
going study aims at finding an algorithm for computing the 
<&(C)'s and V in a more expressive DL, including in particu- 
lar negation and disjunction of concepts. 

For example, let srce be the problem introduced by the ax- 
iom |@). It is assumed that the constraints associated with the 
concrete role age in KB are <3o, >3o, <45, >45, <7o and >7o, 
that the constraints associated with the concrete role size in 
KB are <4 and >4, that there is no concept A ^ Patient in 
KB such that No Patient C A, and that the only concept 
A 5^ Left-Breast of KB such that No Left-Breast C A is 
A = Breast. Then, the implemented algorithm returns: 

$(srce) = {Patient, 3age.> 30 , 3age.> 45 , 3age.< 70 , 
3tumor.3size.>4, 

3tumor.3localization.Lef t-Breast 
3tumor.3localization.Breast} 

And the 7 elements of $(srce) are added to V . 
Another example, based on the set of axioms (f5]l is: 

•^Partial-Mastectomy) = {Partial-Mastectomy, 
Mastectomy, Surgery, Therapeutic-Decision} 

4.3 Results 

The CabamakA process piloted by the analyst produces a 
set of FCIs. With n = 647 cases and a = 10%, CabamakA 
has given 2344 FCIs in about 2 minutes (on a current PC). 
Only the FCIs with at least a + or a - in both problem prop- 
erties and solution properties were kept, which corresponds 
to 208 FCIs. Each of these FCIs / is presented for inter- 
pretation to the analyst under a simplified form by removing 
some of the items that can be deduced from the ontology. In 
particular if Pp b G I, Qp b G I and \= a P E Q then Qp b is 
removed from /. For example, if P = (3age >4s) G V, 
Q = (3age > 30 ) G V and (3age >45)p b € I, then , nec- 
essarily, (3age >3o)p b G /, which is a redundant piece of 
information. 

The following FCI has been extracted from CabamakA: 

I = {(3age. <7o); b > 

(3tumor.3size. <4)p b , (3tumor.3size. >4)p b , 
Curettage^, Mastectomy^, 

Partial-Mastectomy~ 01 , Radical-Mast ectomy* 01 } 



It has been interpreted in the following way: if srce and tgt 
both represent classes of patients of less than 70 years old, if 
the difference between srce and tgt lies in the tumor size of 
the patients — less than 4 cm for the ones of srce and more 
than 4 cm for the ones of tgt — and if a partial mastectomy 
and a curettage of the lymph nodes are proposed for the srce, 
then Sol(tgt) is obtained by substituting in Sol(srce) the 
partial mastectomy by a radical one. 

It must be noticed that this example has been chosen for its 
simplicity: other adaptation rules have been extracted that are 
less easy to understand. More substantial experiments have 
to be carried out for an effective evaluation. 

The choice of considering every pairs of distinct source 
cases can be discussed. Another version of CabamakA 
has been tested that considers only similar source cases, as 
in lHanney and Keane, 1996]: only the pairs of source cases 
such that |<3>(srcei) D < I>(srce2)| > k were considered (ex- 
perimented with k = 1 to k = 10). The first experiments have 
not shown yet any improvements in the results, compared to 
the version without this constraint (k = +oo), and involves 
the necessity to have the threshold k fixed. 

5 Conclusion and Future Work 

The CabamakA system presented in this paper is in- 
spired by the research of Kathleen Hanney and Mark T. 
Keane I jHanney and Keane, 19961 and by the principles of 
KDD for the purpose of semi-automatic adaptation knowledge 
acquisition. It reuses an FCI extraction tool developed in our 
team and based on a CHARM-like algorithm. Although im- 
plemented for a specific application to breast cancer treat- 
ment decision support, it has been designed to be reusable 
for other cbr applications: only a few modules of Cabama- 
kA are dependent on the formalism of the cases and of the 
domain ontology, and this formalism, OWL DL, is a well- 
known standard. 

One element of future work consists in searching for ways 
of simplifying the presentation of the numerous extracted 
FCIs to the analyst. This involves an organization of these 
FCIs for the purpose of navigation among them. Such an 
organization can be a hierarchy of FCIs according to their 
specificities or a clustering of the FCIs in themes. 

A second piece of future work, still for the purpose of 
helping the analyst, is to study the algebraic structure of all 
the possible adaptation rules associated with the operation of 
composition: r is a composition of n and r% if adapting 
(srce, Sol(srce)) to solve tgt thanks to r gives the same 
solution Sol(tgt) as (1) solving a problem pb by adaptation 
of (srce, Sol(srce)) thanks to r% and (2) solving tgt by 
adaptation of (pb, Sol(pb)) thanks to ra. The idea is to find a 
smallest family of adaptation rules, F, such that the closure of 
F under composition contains the sets of the extracted adap- 
tation rules expressed in the form of FCIs. It is hoped that F 
is much smaller than S and so requires less effort from the an- 
alyst while corresponding to the same adaptation knowledge. 

Another study on AKA for our CBR system was AKA from 
experts (based on the analysis of the adaptations performed 
by the experts). This AKA has led to a few adaptation rules 
and also to adaptation patterns, i.e. general strategies for 



case-based decision support that are associated with explana- 
tions but that need to be instantiated to become operational. 
A third future work is mixed AKA, that is a combined use of 
the adaptation patterns and of the adaptation rules extracted 
from CabamakA: the idea is to try to instantiate the former 
by the latter in order to obtain a set of human-understandable 
and operational adaptation rules. 
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